IEEE Transactions on Medical Imaging
● Institute of Electrical and Electronics Engineers (IEEE)
Preprints posted in the last 90 days, ranked by how well they match IEEE Transactions on Medical Imaging's content profile, based on 18 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Li, S.; Gao, J.; Kim, C.; Choi, S.; Huang, H.; Wang, X.; Shi, J.; Chen, Q.; Wang, Y.; Wu, S.; Zhang, Y.; Huang, T.; Zhou, Y.; Yao, B.; Yao, Y.; Li, C.
Show abstract
Three-dimensional photoacoustic imaging (3D PAI) commonly relies on sparse sensor arrays, which restrict angular sampling, detection aperture, and instantaneous field-of-view (FOV). Moving the sensor array relative to the target provides an effective route to multi-view imaging and large volume photoacoustic mapping, but accurate fusion of multiple poses conventionally depends on motor feedback or external tracking hardware. Such tracking increases system complexity and can suffer from calibration errors, backlash, and motion instability. Here we introduce PA-SfM, a tracker-free differentiable acoustic structure-from-motion (SfM) framework that recovers sensor array poses directly from photoacoustic measurements. By integrating a differentiable acoustic radiation model with hierarchical optimization and rigid-array constraints, PA-SfM jointly estimates inter-view transformations and reconstructs 3D photoacoustic volumes without external pose measurements. We validated PA-SfM using numerical simulations, in vivo rat kidney and liver imaging with known relative geometry, and a mechanically scanned 3D PAI system. In mechanically rotated mouse liver imaging, PA-SfM produced sharper and more continuous vascular reconstructions than encoder-based fusion. In translational multi-pose imaging, PA-SfM supported expanded FOV vascular mapping without translation-stage pose input. In controlled quantitative validations, PA-SfM achieved high reconstruction fidelity, with PSNRs of 38.90-41.42 dB and SSIMs of 0.9637-0.9864 relative to groundtruth or known-pose reference reconstructions. These results establish PA-SfM as a robust computational framework for tracker-free multi-view and expanded FOV 3D PAI, providing a complete algorithmic foundation for freehand 3D PAI. The source code is publicly available at https://github.com/JaegerCQ/PA-SfM.
Zhang, G.; Leroy, H.; Rideau, B.; Reygrobellet, A.; Pernot, M.; Deffieux, T.; Ialy-Radio, N.; Pezet, S.; Tanter, M.
Show abstract
Microbubble contrast-enhanced ultrasound (CEUS) relies on discriminating nonlinear bubble signals from linear tissue backscattering. While Singular Value Decomposition (SVD) filtering improves this discrimination, existing techniques often fail to retain the slowly-moving microbubble signals from static clutter. Here, we present a novel multi-stage singular value decomposition (MS-SVD) framework for ultrafast CEUS imaging. Our method employs plane-wave transmissions at multiple angles and acoustic pressure levels (implemented via duty-cycle modulation) and alternating transmit polarity. The beamformed data are then processed by three sequential SVD filters: (1) spatial-angular SVD to extract coherent signals across all transmit angles, (2) spatial-pressure SVD to separate linear fundamental and nonlinear harmonic components, and (3) spatiotemporal SVD to isolate moving microbubble echoes from tissue clutter. In in vitro flow phantoms and in vivo rat brain through a cranial window, MS-SVD dramatically improves microbubble detection compared to conventional SVD filtering, MS-SVD yields much stronger vascular contrast and suppresses tissue clutter to a greater extent. The resulting power-Doppler and super-resolution maps are notably cleaner and more complete: MS-SVD detects substantially more microbubble events in ULM, revealing finer vessel details and more accurate flow speeds. By capturing the full acoustic signature of microbubbles (both fundamental and harmonic), MS-SVD achieves higher contrast-to-noise and sensitivity in CEUS. These gains make it a powerful front-end for super-resolution ultrasound localization microscopy and other high-sensitivity microvascular imaging applications.
Maidu, B.; Gonzalo, A.; Guerrero-Hurtado, M.; Bargellini, C.; Martinez-Legazpi, P.; Bermejo, J.; Contijoch, F.; Flores, O.; Garcia-Villalba, M.; McVeigh, E.; Kahn, A.; del Alamo, J. C.
Show abstract
Atrial fibrillation (AF) promotes blood stasis and thrombus formation, most often within the left atrial appendage (LAA), and can lead to stroke or transient ischemic attack (TIA). Time-resolved contrast-enhanced computed tomography (4D CT) captures left atrial (LA) opacification and washout, but it does not directly provide quantitative stasis metrics such as blood residence time. Patient-specific computational fluid dynamics (CFD) can quantify LA/LAA residence time, yet routine clinical use is limited by computational cost and sensitivity to patient-specific boundary conditions. Here, we present two complementary approaches to infer time-resolved 3D residence time fields directly from contrast dynamics. First, a physics-informed neural network (PINN) treats contrast as a passive scalar and jointly reconstructs velocity and residence time by enforcing the incompressible Navier-Stokes equations and transport equations for contrast concentration and residence time in moving, patient-specific LA anatomies. Second, an indicator dilution theory (IDT) formulation computes voxelwise, time-resolved residence time maps from contrast time curves alone by constructing a PV-referenced impulse response and modeling transport with a tank-in-series model with spatially dependent parameters. Both methods are benchmarked against patient-specific CFD in six cases spanning diverse LA function, including three patients with TIA or thrombus in the LAA and three patients free of events. Both approaches reproduce expected spatial and temporal trends, with higher residence time in the distal LAA and higher LAA residence time in cases with TIA or thrombus. IDT demonstrates the closest agreement with CFD across the full range of residence times and produces maps in seconds, facilitating clinical translation. In contrast, the PINN additionally recovers phase-dependent atrial flow structures, but tends to smooth and underestimate the highest residence-time regions and requires hours of training. Together, these results support a scalable workflow in which IDT enables rapid stasis screening from contrast CT, and PINNs provide a complementary pathway for detailed, patient-specific hemodynamic inference when full-field flow information is needed.
Wan, S.-Y.; Chen, W.-Y.
Show abstract
Accurate segmentation of nasal and paranasal sinus structures from CT scans is critical for surgical planning and treatment evaluation in rhinology. However, the complex anatomical topology and thin-wall boundaries of these structures pose significant challenges for automated segmentation methods. We propose AFS-DSN (Adaptive Frequency-Spatial Dual-Stream Network), a novel deep learning architecture that integrates multi-scale wavelet decomposition with spatial feature learning for binary segmentation of the nasal cavity complex. Our method employs a dual-stream encoder with frequency branch utilizing three wavelet scales (db1, db2, db4) to capture 24 frequency sub-bands, enabling enhanced boundary detection in anatomically challenging regions. Cross-domain attention and adaptive routing mechanisms dynamically fuse spatial and frequency features based on local tissue characteristics. We formulate the task as binary segmentation where all five anatomical structures (maxillary sinus, sphenoid sinus, ethmoid sinus, frontal sinus, and nasal cavity) are treated as a unified foreground region against the background, prioritizing clinical boundary detection over individual structure differentiation. Evaluated on the NasalSeg dataset (130 CT volumes) with a 70/15/15 train/validation/test split, AFS-DSN achieves 94.34% {+/-} 2.30% overall Dice coefficient with statistically significant improvements in thin-wall regions (91.34% vs. 90.57% baseline, p=0.004) and statistically significant improvement in Surface Dice at 1mm tolerance (0.874 vs. 0.868 baseline, p=0.010), demonstrating enhanced boundary precision while maintaining sub-second inference time, making the method suitable for surgical planning applications where sub-millimeter accuracy is clinically relevant. To address concerns regarding model complexity, we further introduce AFS-DSN-Lite, a parameter-efficient variant (27.41M parameters) that achieves comparable performance (94.37% Dice) through depthwise separable convolutions, and validate robustness via 3-fold cross-validation (mean Dice: 94.59% {+/-} 0.31%).
Yu, S.; Wang, H.; Wang, N.; Chen, S.; Wu, J.; Yuan, Z.; Qi, T.; Zhou, Z.; Xia, F.; Ma, J.; Zhou, Y.
Show abstract
Biomedical image segmentation is a fundamental problem in computational biomedicine that aims to precisely delineate anatomical and biological structures, tissue types, or pathological regions in biomedical images. Accurate segmentation is essential for interpretation, decision-making, and quantitative analysis across a wide range of biological and medical applications. Over the past decade, the field has undergone a profound paradigm shift, evolving from task-specific specialist models to universal foundation models. This review provides an in-depth analysis of the evolution, tracing how the limitations of local discriminative learning drove the transition toward transformer-based global modeling, and large-scale generative pre-training. To help navigate the diverse landscape of interaction paradigms, we introduce the first systematic taxonomy of promptable biomedical image segmentation, categorizing existing methods into six distinct types, enabling users to intuitively select appropriate prompting strategies based on visual demonstrations and quickly pinpoint relevant literature (Prompt Type Visualization). Beyond model architectures, we discuss parallel advancements in dataset development, evaluation protocols, and application-specific adaptations across radiology, pathology, and biology. Integrating these powerful foundation models with rigorous domain-specific adaptation has great potential to improve patient outcomes and healthcare efficiency. Finally, we highlight key challenges in trustworthiness and clinical integration that must be overcome to realize the potential of the next generation of biological and medical generalists.
Huo, H.; Xu, Y.; Yao, R.; Lowerison, M.; Song, P.; Yao, J.
Show abstract
Three-dimensional photoacoustic tomography (3D-PAT) enables noninvasive structural and functional imaging with optical absorption contrast and ultrasonic detection depth. However, its spatial resolution is limited by acoustic diffraction, and incomplete detection geometry can substantially degrade image fidelity and quantitative accuracy. Here, we present a ULM-guided model-based reconstruction framework, termed 3D-PAULMprior that incorporates sub-diffraction vascular priors from concurrent ultrasound localization microscopy (ULM) into 3D photoacoustic reconstruction. The method uses weighted regional Laplacian regularization to integrate high-resolution vascular information into the inverse problem, thereby enhancing vascular sharpness, suppressing limited-view artifacts, and improving blood oxygen saturation estimation. We validated 3D-PAULMprior using numerical simulations, tissue-mimicking phantoms, and in vivo mouse brain imaging. Compared with conventional reconstruction, 3D- PAULMprior improved spatial resolution by over 50%, increased contrast-to-noise ratio by 261.2%, and enhanced structural similarity index by 24.6%. In vivo, 3D-PAULMprior recovered vascular structures that were poorly resolved or missing in conventional reconstructions and produced more spatially confined sO2 maps. These results establish 3D-PAULMprior as a robust multimodal reconstruction strategy for high-resolution structural and functional photoacoustic imaging.
Magdoom, K. N.; Sarlls, J. E.; Basser, P.
Show abstract
The majority of MR-based brain imaging methods provides macroscopic information averaged over the entire imaging voxel. Yet tissue composition and microstructure are heterogeneous within the cubic millimeter-sized MRI voxel that contains numerous distinct water pools at mesoscopic, microscopic, and nanoscopic length scales. Accurately measuring their individual characteristics in live human brain has the potential to reveal hidden salient meso/micro-structural features and uncover subtle changes that may occur in development, neurological disorders, trauma, etc. Nevertheless, because of many technical and scientific challenges, there is a dearth of robust, quantitative methods to probe tissue water dynamics at these subvoxel length scales. Here we present a novel empirical spectroscopic diffusion MRI method that estimates the probability density function (pdf) of diffusion tensors, i.e., the diffusion tensor distribution (DTD) in the human brain in-vivo. Our method entails performing a multi-dimensional Inverse Laplace Transform (ILT) which is generally an ill-posed and ill-conditioned problem. However, we overcome these obstacles using a hierarchy of lower-dimensional marginal distributions of the DTD estimated from diffusion weighted (DW) signals obtained from single, double, and triple pulsed-field gradient (PFG) experiments. Iteratively applying this hierarchy of marginal distribution progressively shrinks the space of admissible solutions. We extensively vet this framework with simulated DWI data obtained from realistic DTD motifs that mimic different cell and tissue properties seen in the brain. We then experimentally test our approach in vivo in brains of healthy normal human subjects. We segment the reconstructed DTD within a voxel to identify signatures of different tissue and cell types, and cluster these DTDs to identify various water pools. We use the high dimensional spectrum to robustly remove the free water compartment that often confounds tissue microstructure. We take ensemble averages of invariants of the micro-diffusion tensors, and measure and map their distributions to visualize salient intrinsic mesoscopic features. Since DTD MRI subsumes DTI, we also compute the family of DTI-derived quantitative imaging biomarkers from the moments of the distributions of the mean diffusivity and FA derived from the DTD. Our approach has great translational potential, revealing new microstructural features not observed previously observed in in vivo MRI.
Lee, S.; Shivaei, S.; Shapiro, M. G.
Show abstract
Ultrasound is emerging as a method for molecular and cellular imaging by connecting the versatile physics of sound waves to protein-based contrast agents such as gas vesicles (GVs). BURST is a common imaging mode that leverages the strong, transient echoes generated when GVs collapse under acoustic pressure to enable highly sensitive ultrasound visualization of cells and biomolecules, down to the single cell level. However, BURST is vulnerable to fluctuating background signals, with large-amplitude fluctuations in scattering, as often present in vivo, obscuring genuine GV responses. In this study, we mathematically examine this limitation and show that incorporating statistical metrics such as correlation or temporal contrast-to-noise ratio effectively suppresses unwanted non-GV voxels and quantifies detection confidence, including in image sequences in which GV collapse spans multiple frames. Compared with prior methods, our approach enhances the clarity of BURST images and provides probabilistic interpretations of GV signals, facilitating more reliable analysis of ambiguous in vivo molecular imaging, as we demonstrate in imaging tumor-homing probiotics and gene expression in the brain.
Ma, S.; Xu, M.; Dao, M.; Li, H.
Show abstract
Microscopy-based analysis of red blood cell (RBC) morphology is widely used to study phenotypes in sickle cell disease (SCD). Although AI models have been developed to automate classification, most are trained on pre-cropped single-cell images and thus struggle with full-scope microscopic images containing densely packed cells and diverse morphologies, which require both accurate detection and fine-grained classification. We propose an end-to-end computational framework to identify individual RBCs in full-scope microscopy images and classify them into five morphological categories: discocytes (DO), echinocytes (E), elongated and sickle-shaped cells (ES), granular cells (G), and reticulocytes (R). We first evaluate advanced detection-classification models, including You Only Look Once (YOLO) and Detection Transformers (DETR), and demonstrate that while these models effectively detect cells, their classification performance falls short of specialized classifiers trained on single-cell images, particularly for minority phenotypes. To address this limitation, we introduce a two-step framework in which a YOLO-based detector localizes and crops individual cells from full-scope images, followed by a fine-tuned DenseNet121 ensemble classifier that assigns each cell to one of the five morphological categories. The proposed framework achieves a detection-level F1-score of 0.9661 and a weighted-average classification F1-score of 0.9708, with an overall classification accuracy of 97.06%. Compared with the single-step YOLO26n baseline, the two-step pipeline yields a macro-average F1-score improvement of +0.1675, with particularly substantial gains for minority classes (E: +0.1623; G: +0.2774; R: +0.2603). Overall, this hybrid framework demonstrates a practical strategy for adapting fast, general-purpose detection models to domain-specific biomedical tasks by combining them with specialized classifiers, delivering both efficiency and high accuracy for scientific and clinical image analysis.
Garay, G.; Barolin, J.; Sorriba, V.; Damian, J. P.; Kou, Z.; Oelze, M.; Negreira, C.; Kun, A.; Brum, J.
Show abstract
Null Subtraction Imaging (NSI) is a nonlinear beamforming approach that combines multiple receive apodizations and subtraction to improve spatial resolution in ultrasound imaging. In NSI, a DC offset parameter is introduced in the apodization design to control the sharpening of the effective beam pattern and, therefore, the degree of spatial-resolution enhancement. Here, we investigate the use of NSI in functional ultrasound (fUS) imaging of the mouse brain and compare its performance with conventional delay-and-sum (DAS) beamforming across a range of DC offset values. fUS acquisitions were performed in three anesthetized wild-type mice during periodic vibrissae stimulation. Activation maps were computed by correlating cerebral blood volume (CBV) signals with the stimulation pattern. Activation area, edge gradient, Dice similarity coefficient, and signal-to-noise ratio (SNR) were used to evaluate spatial localization, boundary sharpness, vascular alignment and signal stability, respectively. NSI yielded more spatially confined activation maps than DAS and produced sharper activation boundaries. However, for low DC offsets (DC < 0.5), the CBV signal exhibited increased fluctuations, which reduced temporal stability and limited the reliability of the functional maps. As the DC offset increased, temporal SNR improved, while the spatial-resolution gain progressively decreased. In our imaging configuration, intermediate DC values around DC {approx} 0.5 provided the most favorable compromise between improved spatial localization and sufficient temporal stability for reliable functional activation detection. These results demonstrate the feasibility of applying NSI to functional ultrasound imaging and provide a quantitative framework for selecting the DC parameter in fUS studies.
You, L.; Dang, H.; Wang, H.; Matta, E.; zhou, X.
Show abstract
Image-based liver Couinaud segmentation is designed to automatically provide the locations of suspicious objects in liver CT/MR images. Once achieved, the physicians will be guided to the target slice and area where the suspicious node is located. However, conventional algorithms trained primarily on healthy liver images often fail to generalize to Hepatocellular Carcinoma (HCC) cases due to pathological structural distortions. In this work, we propose a robust two-stage framework that integrates a 3D Unet with a 3D Anatomical Structure-Guided Graph Convolutional Network (3D GCN). This two-stage strategy effectively isolates the liver volume to eliminate structural noise from neighboring organs, such as the spleen, allowing the framework to focus exclusively on the complex 3D anatomical relationships among the eight segments. To ensure the topological consistency required for global spatial reasoning, we implement a standardized preprocessing pipeline that normalizes liver-only volumes to exactly 50 frames along the z-axis. By combining a lightweight 3D UNet backbone with the 3D GCN for refined boundary reasoning, our model demonstrates superior generalization performance on unseen clinical datasets, achieving a mean Dice score of 0.828 in blind testing. By releasing our code and pretrained weights, we aim to provide the first publicly available deep learning resource for robust Couinaud segmentation.
Xie, C.; Wang, Y.; Li, D.; Yu, B.; Peng, S.; Wu, L.; Yang, M.
Show abstract
Handheld ultrasound devices have revolutionized point-of-care diagnostics, but their effectiveness remains limited by operator dependency and the need for specialized training. This paper presents an intelligent guidance and diagnostic assistance system for the handheld wireless ultrasound device, enabling automated carotid artery and thyroid examinations through handheld operation. Drawing inspiration from the Actor-Critic framework, we implement a simulation-based reinforcement learning approach for real-time probe navigation toward standard anatomical views. The system integrates YOLOv8n-based detection networks for carotid plaque and thyroid nodule identification, achieving real-time inference at 30 frames per second. Furthermore, we propose a hybrid measurement approach combining UNet segmentation with the Snake algorithm for precise biometric quantification, including carotid intima-media thickness (IMT), lumen diameter, and lesion dimensions. Experimental validation on clinical datasets demonstrates that the proposed system achieves 91.2% accuracy in standard plane acquisition, 87.5% mean average precision (mAP) for plaque detection, and 89.3% mAP for nodule identification. Measurement results show excellent agreement with expert sonographers, with IMT measurements exhibiting a mean absolute difference of 0.08 mm. These findings demonstrate the feasibility of intelligent handheld ultrasound examination, significantly reducing operator dependency while maintaining diagnostic accuracy comparable to experienced clinicians.
Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.
Show abstract
Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.
Tustison, N. J.; Avants, B. B.; Cook, P. A.; Gee, J. C.; Stone, J. R.
Show abstract
In modeling complex probability distributions, normalizing flows provide exact-likelihood, bijective mappings between empirical data and tractable latent spaces. Building on this foundation, latent-aligned multiview normalizing (LAMNr) flows leverage these salient properties to learn shared latent subspaces across heterogeneous, multimodal datasets while simultaneously topologically unfolding the sampled data manifold into a continuous vector space. Formal latent-alignment constraints are used to model shared structural features separate from view-specific variations, coordinating latent projections into a shared geometric subspace. By applying this transformation in the context of biological imaging, the framework establishes a potential basis for a deep learning interpretation of foundational computational anatomy concepts, such as the population template, latent distances, and geodesic pairwise image interpolation. Additionally, the proposed framework enables closed-form conditional modeling for exact cross-view imputation and other latent space manipulations. Evaluations and illustrations on both imaging-derived phenotypes (IDPs) and multimodal MRI demonstrate the proposed framework and potential applications. To further motivate our work, we provide a robust and comprehensive, 2D and 3D open-source implementation in PyTorch, natively integrated with the ANTsX ecosystem (i.e., ANTsTorch) for efficient training and subsequent data transformation, manipulation, and analysis.
Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;
Show abstract
We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.
Koshe, A.; Sobhani Tehrani, E.; Jalaleddini, K.; Motallebzadeh, H.
Show abstract
Quantifying the diagnostic dispersion of inferred parameter distributions is a challenge in uncertainty-aware modeling. Scalar summaries such as credible interval width are topology-blind; fundamentally different posterior morphologies can yield identical scores, obscuring whether a parameter is precisely estimated or constrained to a range. We propose a Composite Certainty Framework that addresses this metric degeneracy by aggregating five complementary uncertainty metrics including interquartile range, standard deviation, full width at half maximum, Shannon entropy, and mass width. These metrics are aggregated through non-parametric Borda rank voting into a single, unitless consensus certainty score. Applied to a simulation-based inference pipeline for a finite-element model of the human middle ear tuned to cadaveric acoustic measurements, the framework reveals parameter-specific identifiability profiles invisible to any individual metric. It produces two actionable clinical thresholds: (1) the maximum tolerable measurement noise for reliable parameter recovery, and (2) the minimum simulation budget for posterior convergence. We demonstrated that no single metric captures all aspects of posterior dispersion, as spread-based metrics and entropy diverge systematically for clinically critical parameters, whereas their aggregation produces a consensus reflecting genuine diagnostic certainty. The framework is generalizable to any model-based diagnostic pipeline where posterior distribution not merely its coverage, but determines clinical certainty.
Jia, Y.; Niu, J.; Qie, Z.; Li, Z.; Laine, A. F.; Guo, J.
Show abstract
Accurate classification of brain tumors from MRI is critical for guiding clinical decision-making; however, existing deep learning models are often hindered by limited interpretability and pronounced sensitivity to hyperparameter selection, which constrain their reliability in medical settings. To address these challenges, we propose TumorCLIP, a lightweight and training-efficient vision-language framework that integrates radiology-informed text prototypes with a DenseNet-based visual encoder to support clinically meaningful semantic reasoning, fused via a Tip-Adapter mechanism. TumorCLIP does not aim to introduce a new vision-language model architecture. Instead, its contribution lies in the integration of radiology-informed text proto-types tailored to MRI interpretation, a systematic evaluation of backbone stability across diverse visual architectures, and a lightweight, training-efficient CLIP-based fusion framework designed for medical imaging applications. We first conduct a comprehensive unimodal benchmark across eight representative visual backbones (EfficientNet-B0, MobileNetV3-Large, ResNet50, DenseNet121, ViT, DeiT, Swin Transformer, and MambaOut) using a standardized optimizer and learning-rate grid search, revealing performance swings exceeding 60 percentage points depending on hyperparameter choices. DenseNet121 shows the strongest stability-accuracy trade-off within our evaluated optimizer and learning-rate grid (97.6%). Leveraging this foundation, TumorCLIP fuses image features with frozen CLIP-derived text prototypes, achieving concept-level explainability, robust few-shot adaptation, and enhanced classification of minority tumor classes. On the test set, TumorCLIP attains 98.5% accuracy, including a +1.86 percentage point recall increase for Neurocytoma, suggesting that radiology-informed textual priors can improve semantic alignment and help refine diagnostic decision boundaries within the evaluated setting. Additional evaluation on an independent external dataset shows that TumorCLIP achieves improved cross-dataset performance under the evaluated distribution shift, relative to the unimodal DenseNet121 baseline. These results demonstrate TumorCLIP as a practical, interpretable, and data-efficient alternative to conventional visual classifiers, providing evidence for radiology-aware vision-language alignment in MRI-based brain tumor classification. All results are reported within the evaluated datasets and training protocols.
Kaur, M.; Abbasi, H.; McMorland, A. J.
Show abstract
Accurate pose estimation is central to automated infant General Movements Assessment during the fidgety period, when subtle limb movements, particularly at distal joints inform neurodevelopmental risks. Robust 2D pose tracking from handheld videos remains challenging in real-world settings, where occlusion, rapid motions, and visually ambiguous smaller joints frequently compromise anatomical accuracy. We present CRADLE, a clinically motivated, anatomy-aware post-processing pipeline designed to refine infant 2D movement trajectories across 24-anatomocal landmarks detected by our DeepLabCut-trained model. CRADLE integrates segment-length constraints, velocity-based anomaly detection, anatomically constrained interpolation, and Kalman filtering to correct both large localization failures and subtle persistent joint misplacements without relying primarily on confidence scores. Evaluations against conventional Confidence-Thresholding using Mean Absolute Error (MAE), {Delta}MAE, average Percentage of Correct Keypoints, and net keypoint correction rate showed consistently reduced or preserved error while maintaining accurate trajectories, with the strongest gains achieved at clinically important distal joints. Mean improvements reached up to 5 pixels for some smaller distal landmarks, large-magnitude corrections occurred more often than with Confidence-Thresholding, and well-localised joints remained largely unaffected. Positive net correction rates across metacarpophalangeal and metatarsophalangeal distal-landmarks further confirmed a favourable correction-degradation balance. By improving pose trajectory quality, CRADLE enhances the reliability of downstream movement analysis.
McGarraugh, C.; Menozzi, L.; Yao, R.; Eng-Wu, D.; Nguyen, V. T.; Cho, S.-W.; Francis, S.; Yao, J.
Show abstract
Quantitative molecular imaging in photoacoustics is fundamentally limited by the ill-posed nature of spectral unmixing, where spectral overlap, noise, and unknown fluence introduce bias in conventional inversion-based methods. We introduce photoacoustic fingerprinting (PAF), a framework that reframes spectral unmixing as a fingerprint recognition problem. PAF interprets multispectral signals as high-dimensional fingerprints encoding both molecular composition and measurement distortions. Inspired by magnetic resonance fingerprinting, PAF uses a recurrent neural network trained on synthetic data spanning realistic mixtures, noise levels, and fluence variations to directly infer molecular concentrations from spectral shape. PAF enables accurate and robust quantification in regimes where conventional methods break down, including low signal-to-noise conditions, spectrally correlated mixtures, and unknown fluence distortions. In controlled simulations, PAF consistently outperformed non-negative least squares, with the largest gains observed for spectrally overlapping chromophores such as collagen. In phantom studies, PAF improved molecular specificity by correctly localizing collagen and recovering water contrast despite similar spectral reconstructions. In ex vivo mouse livers, PAF detected lipid accumulation associated with steatosis, and in human arteries, it identified molecular signatures consistent with thrombus and lipid-rich plaque. These results establish PAF as a generalizable framework for label-free molecular imaging and a promising step toward quantitative photoacoustic diagnostics.
Su, H.; Fan, W.; Peng, J.; Zhang, Y.
Show abstract
High bit-depth medical images preserve subtle intensity variations that are important for quantitative analysis and clinical interpretation, but their large dynamic range poses challenges for efficient compression. We propose a bit-plane-aware dual-stream compression framework for 16-bit medical images by separately modeling the most significant bit (MSB) and least significant bit (LSB) components. The MSB structural stream is encoded using JPEG coding with a Duplicate Segment Skipping (DSS) strategy to exploit spatial and segment-level redundancy, while the LSB detail stream is compressed using learned image compression to represent residual variations and fine-grained details. Experiments on four MRI and CT datasets show that the proposed method consistently outperforms representative traditional and learning-based codecs, achieving the lowest bit rate across all datasets. Meanwhile, it preserves high reconstruction fidelity. As a downstream application, we further demonstrate that the compressed bitstreams can be effectively integrated with DNA encoding and converted into sequences with favorable biochemical properties.